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C/3 ' Abstract 

Y\ ' PageRank, the prestige measure for Web pages used by Google, is the stationary probability of 

. ,_H , a peculiar random walk on directed graphs, which interpolates between a pure random walk and a 

^ ' process where all nodes have the same probability of being visited. We give some exact results on the 

j^^l distribution of PageRank in the cases in which the damping factor q approaches the two hmit values 

f!i , and 1. When g — > and for several classes of graphs the distribution is a power law with exponent 
2, regardless of the in-degree distribution. When g — > 1 it can always be derived from the in-degree 
distribution of the underlying graph, if the out-degree is the same for all nodes. 

(N 
> 

^ ; 1 Introduction 

Lj. ' Since the letter of Pearson Pearsonlll90a |. published on Nature in 1905. random walk has become a central 
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f~^ . concept in many branches of the physical sciences. The number of applications and studies dedicated 

\^ ' to the subject is in fact so large that to give even a very partial list of references is an overwhelming 



^D . enterprise (see Hugheslll995j for a recent and fairly complete review). Most of the attention, so far, has 



C/3 ' been devoted to the study of random walks and related stochastic processes on d-dimcnsional euclidean 

^ . spaces and regular lattices, for their obvious relevance to physical problems. To extend the definition of 

C/3 ' random walk to an arbitrary graph is trivial, but its study is relatively less developed. In this paper we 

address the issue of the stationary probability of a random walk on a directed scale-free graph. 

The specific application we have in mind is the study of Pagerank (PR), the prestige measure that the 
search engine Google (and several other search engines) employs to measure the prestige of Web pages. 
When a user submits a query, the hits returned by Google are ranked according to their PR values. As 
it will be clear in a moment, such a measure is the stationary probability of a random walk on the Web 
$H ' graph, where each node represents a Web page and edges represent the hyperlinks (naturally directed) 

connecting the pages. 

Let us consider an arbitrary undirected graph and a random walker moving on it. At any (discrete) 
time step the walker jumps from the node where it is sitting on to one of its neighbors chosen with 
equal probability. It is trivial to show that, at stationarity, the probability of each node to be visited is 
proportional to its degree, i.e. the number of neighbors of the node. If the graph is directed we have to 
distinguish (see Fig. J^l) the links adjacent to a node in incoming (those that point to the node) and 
outgoing (those that point away from it). If the random walker is allowed to follow only the outgoing 
links from the node where it presently is, the problem of finding the stationary probability is far more 
complicated. Such probability will in general depend on the overall topological organization of the graph 




Figure 1: The node in the center has two incoming and three outgoing links. 

itself, and cannot be expressed in terms of simple topological quantities like the degree of a node. In 
fact, due to the directedness of the links, the graph may have regions that the walker can enter in but 
not escape from. The stationary probability will be trivially concentrated in these regions. In order to 
prevent this from happening we will consider a modified (directed) random walker whose behavior is 
defined by the following two rules: 

• with probability I — q the walker follows any outgoing link of i, chosen with equal probability; 

• with probability q it moves to a generic node of the network (including i), chosen with equal 
probability. 

This will suffice to ensure a non-zero stationary probability on every node. When considered in the 
context of the Web graph, the process described above could be thought as a rough modelization of a 
Web surfer that occasionally (with probability q) decides to interrupt his/her browsing and to restart it 
from a randomly chosen page. The stationary probability of this process is exactly PR. To adhere to the 
computer science terminology, we will refer to the probability q as to the damping factor. The damping 
factor adopted in real applications is generally small (g ~ 0.15). 

A brave analogy with the undirected case could lead to the hypothesis that PR is roughly proportional 
to the in-degree of a node (number of incoming links), modulo corrections due to the small damping factor. 
Such a view could be further supp orted by the ob servation that the distribution of PR for the real Web 
has a power law decay [Panduran gan et all. 12002 1 characterized by an exponent 2.1 (see Fig. (|2l), like 
the distribution of the in-degree [Albert et aj . yOOO] (note that, when referring to the Web and unless 
otherwise specified, we always assume a damping factor of g ^ 0.15). A direct measure of PR versus in- 
degree on two large samples of the Web graph is shown in Fig. ^ , where the value of PR has been averaged 
over nodes with the same in-degree. The plot exhibits an almost linear behavior with deviations at small 
degrees, when the effect of the damping factor is m ore relevant. Mean fiel d calculations show that there 
is a positive correlation between PR and in-degree [Fortunato et all . l2005l | and a linear relation between 
in-degree and the mean PR for nodes of equal in-degree can be safely assumed if the degree correlations 
between adjacent nodes are weak. On a generic directed graph, the linear relationship between PR (even 
if considered on average) and the in-degree is not granted and it depends on the global organization of 
the graph itself. To address the issue of PR distribution for an arbitrary graph and a generic q would 
therefore require a case by case study. In this paper, therefore, we concentrate on the two interesting 
limits, i.e. q —> and q —> 1, that show some degree of universality. In these two limits it is possible to 
derive analytical expressions for the distribution of PR. For small g- values, a master equation approach 
allows us to solve the problem for special classes of networks. For g ^ 1, it is possible to establish a 
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Figure 2: PR distribution for a large sample of the Web graph, produced by the WebBase collaboration 
in 2003 (www-diglib.stanford.edu/~testbed/doc2/WebBase/). The damping factor is g = 0.15. 

one-to-one correspondence between the distribution of PR and that of in-degree, as long as the number 
of outgoing links from each node (out-degree) is the same. Further, to have a better control on the 
topological characteristics of the graph and how they correlate with the PR distribution we work with 
graphs generated by random processes or processes of growth. 



2 PageRank 

Let us consider a generic directed network with n nodes. Let p{i) be the PR of node i. The vector p 
satisfies the following self-consistent system of relations: 






koutij) 



*-l,2,. 



.,n 



(1) 



where j -^ i indicates a link from j to i and kout{j) is the out-degree of node j. In the following we 
always assume that each node has at least one outgoing link, and therefore Eq. (Q is well defined. To 
compute p amounts to solve the eigenvalue problem for the transition matrix A4, whose element Mij is 
given by the expression: 

and where A is the adjacency matrix of the graph [Aji = 1 if there is a link from j to i, otherwise 

The stationary probability of the process described by Ad is given by its principal eigenvector. Its 
calculation is a standard problem of numerical analysis and can be achieved by repeatedly applying the 
matrix A1 to a generic vector po not orthogonal to p. It is easy to show, in fact, that 1 = Ao > Ai > ... > A„ 
(A's being the eigenvalues of A^), and therefore limj^oo AA^Po = P- The powers of the matrix M introduce 
powers of the eigenvalues A in the decomposition of po in eigenvectors of A^ . In this way, if we aim at 
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Figure 3: PR versus in-degree for two samples of the Web graph, produced by the WebBase coUaboration 
in 2001 and 2003. The damping factor is q = 0.15. 

calculating p with an accuracy e, the vector Ai^po delivers p with corrections at most of the order X\, 
so we can safely stop the procedure when X{ ^ e, i.e. when / ^ log{e)/log{Xi). In this way, the method 
converges rather quickly: in practical applications, it turns out that less than one hundred iterations 
suffice to calculate the PR of a network with lO'^ — 10* vertices. 

PR, and therefore its distribution, depends on the damping factor q in a non-trivial way. A first 
rigorous investigation of this problem was presented in [Boldi et aA . |2005J with focus on how the ranking 
of pages is influenced by changing q and where some close expressions for derivatives of PR with respect 
to q were derived. The damping factor can be considered as an interpolation parameter between a simple 
random walk and a pure scattering process. When q = 0, the process reduces to a simple random walk, 
and one may end up with a trivial invariant measure concentrated on a small subset of nodes. When 
5 = 1, the walker can jump to any node at each step, with probability 1/n. The PR of all nodes is 
then the same, and equals 1/n, as one can see by setting q = 1 in Eq. (^. The distribution of PR is 
therefore a Dirac S function centered at 1/n. For < q < 1 the distribution is not trivial, and in general 
it strongly depends on the underlying graph. On the other hand, in the two limits g — > and q —> 1, 
Eq. (^ assumes forms which lend themselves to simple analytical derivations and the PR distribution 
can be exactly determined for a large set of graphs. It is worth remarking that the limit of small q is the 
relevant one for Web applications. 



3 The general case of a direct-loopless graph 

Given a generic directed graph, the PR of a specific node depends on the overall arrangement of the 
graph and cannot be calculated on the basis of local properties only. In the following we focus on a wide, 
although more restricted class of networks for which analytical solutions are possible. To this class belong 
networks obtained through a growth process that are particularly important for real world applications. 
Let us label the nodes of the network 1, 2, ..., n. We assume that if an oriented path from node i to 
node j exists, there is no path from j to i (in other words, it is impossible to get back to a given starting 



point following an oriented path of the graph). Networks that result from a growth process, where new 
nodes are introduced at discrete time steps together with their new oriented links, obviously belong to 
this class. In fact, we can label nodes according to their age (node 1 being the oldest) so that a directed 
link between i and j may exist only if i > j. 

It is easy to verify that the PR p{i) of a generic node i of a graph in this class can be written as 
follows: 

MO-^fi+E T. """"""'J (3) 

where the first sum runs over all nodes in the graph and the second over all paths from a generic node j 
to node i. Each path in the second sum is weighted by as many factors (1 — g) as links along the path 
(d(P^) is the lenght of path l^^) and is also weighted by the inverse of the degree {k{lj')) of each node 
s encountered along the path. Although correct for any g, the formula above is not very transparent. 
In order to get some understanding on the expected distribution of Pagerank in a graph, we specialize 
Eq. |(2Jl to the case in which each node has a fixed number m of outgoing links. We further focus on 
the limit q —> that has an immediate interpretation in terms of walks over the graph. Under the 
assumptions above, the expression for the PR of a node i simplifies to 



^w = -hE E (^y^'] (4) 

In the following we show that if the graph is grown according to preferential attachment or copying 
mechanism and q is sufficiently small, we should expect an algebraic distribution of PR characterized by 
an exponent 2, for any value of m. We will give the proof in the case m = 1 and a hint to a general proof 
in Sec. m 

3.1 The limit g -> 

Let us suppose that q is very small {q ^ 0) and can be treated as an infinitesimal. Eq. Q, to the first 
order in g, reads: 

^W-'+Et^ * = l,2,...,n (5) 

where we have made the approximation 1 — q ^ 1. The general expression in Eq. ^ grants that this 

approximation leads to the exact result. Since m = 1 there cannot be more than one path between two 

nodes and the network is an oriented tree. Under the assumption that nodes have out-degree 1, Eq. Q 

reads: 

p(j)^l+Vp(j) * = 1,2,...,7^. (6) 

n ^ — ' 

meaning that the PR of a node is the sum of a constant term {q/n) and the PR of its in-neighbors. In 
Fig. Q) we show a subgraph of a tree. Node A is the root of the subgraph. A random walker moving from 
any node in the subtree and constrained by the directions of the links will necessarily reach A. We call 
therefore the nodes in the subtree predecessors of A (we include A among its predecessors) . The three 
empty circles are "leaves" of the subgraph, as they have no incoming links. Starting from the leaves, and 
using Eq. © recursively, it is possible to calculate the PR of all nodes of the diagram. The values are 
reported next to the nodes. The figure shows that 

• all PR values are multiples of the elementary unit q/n] 

• PR increases if one moves from a node to another by following a link; 



n 




Figure 4: Subgraph of a tree. A node A is shown together with all its predecessors. 

• the PR of each node i, in units of q/n, equals the number of its predecessors. 

In the following, PR is measured in units of q/n, and, accordingly, the probability distribution is written 
as Ppii{l), with I = 1,2, ..., n. When a new node N gets connected to a generic node of the subgraph of 
Fig. Q, the PR of node A increases by q/n. Further, all the nodes on the path between N and A count N 
as a predecessor and therefore they similarly increase their PR by q/n (Fig. (0)). In the next subsections 
we specialize the a bove to networks grown by a linear preferential atta chment mechanism, either expl icitly 
(Barabasi-Albert 'Albert & Barabasi, 1999] and Dorogovtsev et al. JDorogovtsev fc MendesL l20fl(ll |). or 
implicitly (Copying model ^Kleinberg et a/., . .1999.] 1. 



3.2 Explicit preferential attachment 

In the model of Dorogovtsev et al. (DMS) Dorogovtsev &: Mendea . 120001 . adapted to a directed graph, 
the probability that a new node i attaches its link to a node j (with in-degree kj ) is 



n(fcj, a) 



ELi(« + ^0 



(7) 



i.e. it only depends on the in-degree of the target node and on a real constant a > 0. E g. Q is a 
generalization of the linking probability of the Barabasi and Albert (BA) model Albert fc BarabasiL 
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Figure 5: If a new node N gives its link to any node of the subgraph, the PR of the uppermost node A 
will increase by q/n. 



Il999j . which is recovered when a coincides with the (fixed) out-degree m of the nodes {m — 1 in the 
present case). The derivation below, therefore, encompasses the BA model as a particular case. 

It is known that the DMS model leads to a scale-free in-degree distribution with exponent 7 = 2 + a. 
We start from a network with n nodes. The probability distribution of PR is, initially, PpR^{l)- In order 
to write a master equation that relates PpR^^^{l) to PpR^il), one notes that the addition of node n + \ 
increases by q/n the PR of all nodes in the path between n+1 and 1, while the others remain unaffected. 
In this way, among the nodes of the path, PR / — 1 will become I, whereas PR I will become / -I- 1. Let us 
consider a generic node i with PR equal to I. The probability Ilf that the new link will change the PR 
of i from / to Z + 1 is equal to the probability that the link is received by any predecessor of i (including 
i), i.e. 

where j => i indicates that j is a predecessor of i. Note that even if other predecessors of i (besides 
i itself) increase their PR due to the attachment of the new node, they cannot reach the value I + 1, 
as their initial values are necessarily smaller than I. Since all nodes have out-degree m = 1, the total 



number of links of a network with n nodes is n — 1 (we assume that the first node does not create hnks) 
and the denominator of Eq. (jS} takes the simple form 









(a + kt) =an + n-l^ {a+l)n-l. (9) 



The number of predecessors of i is Z, and the total number of adjacent links to the predecessors is ^ — 1 
(see Fig. Q). One finally obtains: 

g + fc, ^ ia + l)l-l 
^^ (a + 1 n-1 a + 1 n-1 ^ ' 

The probability n^(/) that the new hnk wih aher the value of any node in the "PR class" / is then: 

n"(0 = nP]ij,ii)u- = ±±J1-Lpn^^i), (11) 

(a + 1) - 1/n 

The master equation then reads: 

{n + l)P;+\0 - nP^j,{l) = U-{1 - 1) - n"(0. (12) 

Eq. 1)12(1 is a balance equation; the left-hand side expresses the variation of the number of nodes in 
the "PR class" /, after the addition of the (n + 1)*'' node. The first term of the right-hand side is the 
probability that the introduction of the new node increases the number of nodes in the "PR class" I by 
one, the other term instead is the probability that a node leaves that class because its PR increases by 
one unit. Since a single link is added at each iteration, only one node can make either transition, so 
the right-hand side represents the expected variation in the population of nodes in the "PR class" I, i.e. 
exactly what we have on the left-hand side of Eq. (|12|l . 

Note that Eq. H12() holds if ^ > 1. When ^ = 1, it must be modified, because there are no nodes with 
zero PR and the first term on the right-hand-side would be ill-defined. The modification, however, is 
simple. The new node n -I- 1 is a "leaf", and it has PR 1. At each iteration, therefore, the population of 
"PR class" 1 is increased by one. We have 

(n + l)P?+\l) - nP^j,{l) = 1 - n"(l). (13) 

We are interested in the stationary solutions of Eqs. (|12|l and (|13|l . which can be derived by setting 
-^PR^i^) ~ ^p_r(0 — Ppr{^) (valid in the limit when n -^ cxd). In this limit, one can safely neglect 1/n in 
Eq. (|11|) . After rearranging terms we obtain: 

[ 2a+l' U i — i. 

Ppb{1) = T7 ., ,t, — r; 7 - 7T7,for / > 1. (15) 

'^''''- ' [{a + 1)1 + a][{a + 1)1 - 1] l^' ^ ^ 

The probability distribution of PR for a network built according to the DMS model has a power law 
tail with exponent /3 = 2, independently of a. Fig. ((HJ shows PR distributions obtained from numerical 
simulations. They refer to three DMS networks, with parameter a = 1/2, 3, and 10^, respectively. The 
number of nodes is n = 10^ and q = 0.001. The tails of the three curves are straight lines in the double- 
logarithmic scale of the plot, indicating a power law decay, and they are parallel. The continuous line 
has the slope of the predicted trend, showing an excellent agreement. 

As noted above, our analytical result and the simulation for a = 10^ shows that /3 is independent of 
the parameter a, surprisingly in contrast with what happens for the in-degree, that, in the limit a -^ oo, 
turns out to have an exponential distribution. The networks whose PR distributions are shown in the 
plot have been generated with m = 3. Fig. © then confirms that our result holds even when m > 1. 



which leads to: 
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Figure 6: Small-q PR distribution for DMS networks. 



3.3 Implicit preferential attachment: the Copying model 

The Copying model (CM) [Kleinberg et all Il999t iKrapivskv fc Redneil l200l| was originally introduced 
to model the growth of the Web graph. It is based on the reasonable assumption that Web administrators, 
in creating a new page, often "copy" hyperlinks of pages they know. In this framework, a newly created 
node i is a copy of a randomly chosen existing node j . This implies that i sets links to all the neighbors 
of j. Then, with probability a, those links are rewired to other nodes, again chosen at random. The 
model produces a scale- free network with a power law in-degree distribution characterized by an exponent 
7=(2-a)/(l-a). 

Although the linking mechanism is apparently unrelated to the degree of the target node, a closer in- 



spection reveals that the copying mechanism implies an effective linear preferential attachment Pastor-Satorras fc Vespignan: 
|2004j. To derive the PR distribution, we follow closely the strategy of the previous subsection. 

In order to affect the PR of a node i, the link set by the new node n + 1 must again attach to a 
predecessor of i. It is useful to distinguish between the "copying" phase and the "rewiring" phase of the 
linking process. 

In the copying phase, to affect the PR in i, the target node has to be a predecessor of i, excluding i 
itself. After the rewiring phase, the node i will avail itself of a new contribution in PR if the new link 
is untouched by the rewiring or rewired to another predecessor of i (this time including i itself). Let's 
assume that node i is originally in "PR class" I. The probability to pick at random a predecessor of i is 
l/n, if we include i, or [l — l)/n, if we exclude i. So, the probability 11" that the new link will change 

the PR of i is: 

„„ , .. I — I I I + a — 1 , ., 

n^ = (I-a) + a- = — . (16) 



a — = 
n 



The a-dependent terms express the probability to have copying (1 
one can extend the result to all nodes with PR I, like in Eq. (|ll|l 



a) and rewiring (a) . From Eq. H16|l 



n"(0 = nP^m? = {l + a- i)PMi)- (17) 

Plugging the expression of 11" (I) in the balance equations l(T^ and l(T^ . one obtains the following sta- 
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Figure 7: Small-q PR distribution for CM networks. 
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if / > 1; 



From the recursive relation of Eq. H18|l the final expression for the PR distribution follows 
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-J, for I > 1. 



(18) 



(19) 



The result is analogous to the one obtained in the previous section. Since, as mentioned above, the 
linking mechanism of the CM hides an effective linear preferential attachment, the result is not totally 
unexpected. 

A numerical test of the prediction in Eq. 11911 can be found in Fig. iQ, where the PR distributions 
for three networks built with the CM, with a equal to 0.1, 1/2 and 1, respectively, are shown. The 
other relevant parameters are m = 3, n = 10® and q = 0.001. All the curves show the same slope (with 
exponent 2) in a double-logarithmic plot. Note that the CM with a = 1 generates a network with an 
exponential in-degree distribution, analogously to the DMS model in the limit a — > ck). Again, this fact 
does not affect the PR distribution. 



3.4 Hint to a general proof 

We now hint to the possibility to extend the proof presented above to the case m > 1. Let us work 
in the preferential attachment framework. Starting from Eq. Q, we need to introduce the quantity 

Pij — X]/>JGL»J (^^)'' ' ^■'^- ^^"^ contribution of node j to the PR in node i. This quantity, obviously, 
does not change in time. The addition of a new node at time t (therefore the node is labelled t) contributes, 
on average, to the PR in i 
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j^n.n.i 



Pii 
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where the sum runs over the nearest neighbors of the new node t and kj (i) is the degree of node j at time 
t. Taking the average over all realizations of the process of growth and the limit for continuous time, one 
arrives to the following equation: 

p{tQ,t)^ [ p{to,s)k{s,t)^^-^ds (21) 

Jto 2mt 

where pito, t) is the average contribution to the PR of a node born at time io from a node born at time 
t, and k{s,t) is the average degree at time t of a node born at time s. li p{to,t) can be explicitly found, 
then pxito), the average PR at a generic time T of a node born at time tg can be easily calculated as 
PT{to) = /( p{to,t)dt. To compute p{to,t) we need k{s,t) first. In the context of preferential attachment 
k{s,t) is found to be k{s,t) = m(t/to)^^^ (this easily follows from dk/dt = k/2rat and k{to,to) = m). 
This expression provides the kernel for the integral equation H21|l . Once Eq. H21(l is solved (taking into 
account the correct boundary condition p{tQ, to) = 1) and the result properly integrated, it gives, in the 
limit T >> to and g — > 0, pT(io) oc T^/"^ /t^, which in turn gives the expected result: PR is algebraically 
distributed with an exponent equal to 2 independently of m. In general the kernel k{s,t)/2tm in H21|) 
needs to be replaced by that appropriate to the growth model under consideration. 

3.5 Beyond preferential attachment 

We have seen that the PR distribution for special networks has a power law tail with exponent 2, 
independently of the in-degree distribution of the network, which needs not even be a power law (e.g. 
DMS model for a -^ oo, CM for a — > 1). This evidence, together with the observation that the PR 
distribution for the real Web (where a relatively small q is usually employed) has also a power law 
distribution with exponent close to 2, may erroneously lead to the conclusion that the above result 
applies to a general graph. 

A numerical test on a random graph a la Erdos-Renyi [Erdos fc R.enviL ll959J shows the limits of 
the validity of our result. An Erdos-Renyi graph is built starting from a set of n nodes, and setting a 
link independently and with a probability r between any pair of nodes. The resulting network has a 
Poissonian degree distribution, with mean rn. In order to make the graph directed, we orient the link 
i — j with equal probability from i to j or from j to i. There is no "center" and no PR flux towards a 
core of nodes, unlike the networks we have studied above. All nodes will thus have equal rights, and we 
expect little differences in their PR values. Fig. (jHJ shows the PR distribution for a random graph with 
50000 nodes and r = 0.0002; the damping factor q is 0.01. The distribution appears to be a Poissonian, 
like that of in-degree. 

It would be interesting to understand whether the result presented in this paper holds for all networks 
in which random walkers stream towards a core of nodes. We expect the PR distribution to be a power 
law quite generally, but we have no arguments hinting to a universal occurrence of the exponent 2. 
Numerical evidences suggest, in fact, that other exponents are possible. In Fig. we show the small- 
q PR distribution for a citation network of U.S. patents {q — 0.001). Citation networks are practical 
examples of the d irected tre e s we have analyzed so far, as a new paper must necessarily cite older 
papers. The data Hall et all l200l| refer to over 3 miUion U.S. patents granted between January 1963 



and December 1999, and comprise all citations made to these patents between 1975 and 1999. The PR 
distribution is skewed, as expected, but the slope of the tail is quite different from 2, being close to 3. 

4 The limit g ^ 1 

When g = 1 all nodes have the same PR value 1/n. In the following we study the limit q ^ 1 but q j^ 1. 
In our Eq. (^, the constant q/n ~ 1/n is now much larger than the sum on the right-hand-side (we treat 
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Figure 8: Small-q PR distribution for an Erdos-Rcnyi random graph. 



1 — q as an infinitesimal). The PR distribution will then be very narrow and squeezed towards q/n, which 
is not interesting. However, the sum over the neighbors in Eq. Q determines the variable contribution 
to PR, which is responsible for the differences in PR between the nodes. Therefore, we isolate this piece, 
and call it reduced PageRank (RPR). So, the RPR Pr{i) of a node i is defined as 



Pr{i) =p{i) 

n 



1,2,, 



(22) 



The RPR is the probability that, during the PR process, a node is visited by a walker coming through any 
of its incoming links. One can show that the distribution of RPR coincides with the in-degree distribution 
on every graph, provided the out-degree is a constant m. In this case, in fact, when we replace PR with 
RPR through the relation H22|) , Eq. (QJ assumes the following form 



Pr{i) = 



i-q 



m 



Y^ [Pr{j)+q/r 



9(1 - q) 



kin{t) + 



i-q 



Yl P'' (■?')' 



(23) 



where fci„(i) is the in-degree of i. From Eq. (19) it follows that the RPR of a node is of order 1 
terms coming from the sum are of order (1 — q)'^ and can be safely neglected. Finally, 



Pr{i) ~ kinii), 



m,n 



z-1,2,. 



All 



(24) 



The RPR of a node is then proportional to its in-degree, and the corresponding distributions coincide, un- 
der no assumptions other than the out-degree is a constant. Therefore, the result has a wide generality. It 
is also intuitive how to extend it to the case in which the out-degree is not constant but approximately the 
same for all nodes. Out-degree distributions concentrated about some value, like Gaussians, Poissonians, 
exponentials, etc., should not change the result. 
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Figure 9: Small-q PR distribution for a citation network of U. S. patents. The continuous line is a power 
law fit of the tail. 

Fig. H10() shows a test of Eq. 124(1 . Each of the four plots is a scatter plot relative to a different 
network; three of them are scale- free and one has an exponential in-degree distribution (bottom right), 
as it has been generated with a CM process for a — I. The RPR of a generic node is compared with 
the right-hand-side of Eq. (|24|l . The continuous line represents the equality of the two variables. The 
comparison with the data points is excellent in all cases. 

After the sub mission of the pap er we realized that the result of this section had already been derived 
and presented in jChen et oi.Ll200fij . We apologize with Chen and colleagues for the unfortunate accident. 

5 Conclusions 

Since the birth of Google, PR has attracted a lot of interest from the scientific community, but the 
deep reasons behind its capacity to capture the "quality" better than other and more used topological 
descriptors (e.g. in-degree) are not yet clear. We studied PR in a more general framework than its 
original field of application (the Web graph) . We derived some exact results for PR distributions in the 
limit when the damping factor q approaches the two extreme values and 1. When g — > 0, for networks 
without directed loops and where walkers stream towards a central core of nodes (roots), PR can be in 
principle calculated in a single sweep over the nodes, starting from the leaves and converging shell- wise 
towards the center. This feature allowed us to calculate exactly the distribution of PR for networks built 
according to some peculiar linking strategies, like that of the DMS model (which includes the BA model 
as a special case) and of the CM. In these cases, the PR distribution has a power law tail with exponent 
2, for any choice of the model parameters, that, on the contrary, strongly affect the in-degree distribution. 
This possibly suggests that the PR process allows to diversify the roles of the different nodes much more 
than in-degree, and it is a better criterion to rank nodes. Many networks have the features that grant, 
on a first approximation, the applicability of our results. Networks grown about one or more centers, 
with new nodes pointing mostly to older nodes belong to this class. The Web itself could be taken as 
an example of this kind of networks. The PR distribution of the Web graph is usually calculated for 
q = 0.15, which is quite close to zero, showing an exponent indeed very close to 2 (see Fig. ^). Work is 
in progress to determine what are the broadest conditions that yield this "universal" behavior. 
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Figure 10: Numerical test of Eq. H24|l for networks built according to different growth models. The 
number of nodes is n = 10^, m — Z and q — 0.999. Top left: BA model. Top right: DMS model for 
a = 3/2. Bottom left: CM for a = 0.1. Bottom right: CM for a = \. 

In the limit g ^ 1, PR is a linear function of in-degree, as long as the out-degree of the nodes is 
fixed. The relation holds at the level of the single node, and not merely in the statistical sense. We plan 
to investigate how general this result is by relaxing the assumption of constant out-degree and trying 
various distributions. 

To summarize, the PR distribution strongly depends on the value of the damping factor g, is in general 
"uncorrelated" from the corresponding in-degree distribution, but depei ids on the overall top ological 
organizaton of the graph. This is not in contradiction with the findings of [Fortunato el oi.! . l200iTJ . where 
a correlation between the two variables was observed, because the correlation involves the in-degree and 
the mean PR- value of all nodes with that in-degree. Within each in-degree class PR has large fluctuations. 
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